Improved Simulation of Stabilizer Circuits 



00 
O 

o 

(N 

00 



Scott AaronsoiO 
MIT 

Daniel Gottesmaqll 
Perimeter Institute 

The Gottesman-Knill theorem says that a stabihzer circuit — that is, a quantum circuit consisting 
solely of CNOT, Hadamard, and phase gates — can be simulated efficiently on a classical computer. 
This paper improves that theorem in several directions. First, by removing the need for Gaussian 
elimination, we make the simulation algorithm much faster at the cost of a factor-2 increase in 
the number of bits needed to represent a state. We have implemented the improved algorithm in 
a freely-available program called CHP (CNOT-Hadamard-Phase), which can handle thousands of 
qubits easily. Second, we show that the problem of simulating stabilizer circuits is complete for the 
classical complexity class ©L, which means that stabilizer circuits are probably not even universal 
for classical computation. Third, we give efficient algorithms for computing the inner product 
between two stabilizer states, putting any n-qubit stabilizer circuit into a "canonical form" that 
requires at most O (n /logn) gates, and other useful tasks. Fourth, we extend our simulation 
algorithm to circuits acting on mixed states, circuits containing a limited number of non-stabilizer 
gates, and circuits acting on general tensor-product initial states but containing only a limited 
number of measurements. 
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PACS numbers: 03.67.Lx, 03.67.Pp, 02.70.-c 
I. INTRODUCTION 

Among the many difficulties that quantum computer 
architects face, one of them is almost intrinsic to the task 
at hand: how do you design and debug circuits that you 
can't even simulate efficiently with existing tools? Ob- 
viously, if a quantum computer output the factors of a 
3000-digit number, then you wouldn't need to simulate it 
to verify its correctness, since multiplying is easier than 
factoring. But what if the quantum computer didn't 
work? Ordinarily architects might debug a computer by 
adding test conditions, monitoring registers, halting at 
intermediate steps, and so on. But for a quantum com- 
puter, all of these standard techniques would probably 
entail measurements that destroy coherence. Besides, it 
would be nice to design and debug a quantum computer 
using classical CAD tools, before trying to implement it! 

Quantum architecture is one motivation for studying 
classical algorithms to simulate and manipulate quan- 
tum circuits, but it is not the only motivation. Chemists 
and physicists have long needed to simulate quantum sys- 
tems, and they have not had the patience to wait for a 
quantum computer to be built. Instead, they have devel- 
oped limited techniques such as Quantum Monte-Carlo 
(QMC) [l| for computing properties of certain ground 
states. More recently, several general-purpose quantum 
computer simulators have appeared, including Oemer's 
quantum programming language QCL [3|, the QuIDD 
(Quantum Information Decision Diagrams) package of 
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Viamontes et al. [3|, [^ , and the parallel quantum com- 
puter simulator of Obenland and Despain [5] . The draw- 
back of such simulators, of course, is that their running 
time grows exponentially in the number of qubits. This 
is true not only in the worst case but in practice. For 
example, even though it uses a variant of binary decision 
diagrams to avoid storing an entire amplitude vector for 
some states, Viamontes et al. [3] report that the QuIDD 
package took more than 22 hours to simulate Grover's al- 
gorithm on 40 qubits. With a general-purpose package, 
then, simulating hundreds or thousands of qubits is out 
of the question. 

A different direction of research has sought to find non- 
trivial classes of quantum circuits that can be simulated 
efficiently on a classical computer. For example, Vidal 
showed that, so long as a quantum computer's state at 
every time step has polynomially-bounded entanglement 
under a measure related to Schmidt rank, the computer 
can be simulated classically in polynomial time. Notably, 
in a follow-up paper p], Vidal actually implemented his 
algorithm and used it to simulate 1-dimensional quantum 
spin chains consisting of hundreds of spins. A second ex- 
ample is a result of Valiant [^] , which reduces the problem 
of simulating a restricted class of quantum computers to 
that of computing the Pfaffian of a matrix. The latter is 
known to be solvable in classical polynomial time. Ter- 
hal and DiVincenzo have shown that Valiant's class 
corresponds to a model of noninteracting fermions. 

There is one class of quantum circuits that is known to 
be simulable in classical polynomial time, that does not 
impose any limit on entanglement, and that arises natu- 
rally in several applications. This is the class of stabilizer 
circuits introduced to analyze quantum error-correcting 
codes [lO, [ill, E, \lM- A stabihzer circuit is simply 
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Phase a|0)+|3|1> Tpl a|0)+ip|1) 

Measurement a|0)+p|1> /^, |a|2|0X0|+|p|2|1X1 1 



FIG. 1: The four types of gate allowed in the stabilizer for- 
malism 



a quantum circuit in which every gate is a controUcd- 
NOT, Hadamard, phase, or 1-qubit measurement gate. 
We call a stabilizer circuit unitary if it does not contain 
measurement gates. Unitary stabilizer circuits are also 
known as Clifford group circuits. 

Stabilizer circuits can be used to perform the encoding 
and decoding steps for a quantum error-correcting code, 
and they play an important role in fault-tolerant circuits. 
However, the stabilizer formalism used to describe these 
circuits has many other applications. This formalism 
is rich enough to encompass most of the "paradoxes" of 
quantum mechanics, including the GHZ (Greenberger- 
Horne-Zeilinger) experiment Il4|. d ense quantum coding 
[15| , and quantum teleportation [l(^ . On the other hand, 
it is not so rich as to preclude efficient simulation by a 
classical computer. That conclusion, sometimes known 
as the Gottesman-Knill theorem, is the starting point for 
the contributions of this paper. 

Our results are as follows. In Section IIIII we give a 
new tableau algorithm for simulating stabilizer circuits 
that is faster than the algorithm directly implied by the 
Gottesman-Knill theorem. By removing the need for 
Gaussian elimination, this algorithm enables measure- 
ments to be simulated in O (n^) steps instead of O (n^) 
(where n is the number of qubits), at a cost of a factor- 
2 increase in the number of bits needed to represent a 
quantum state. 

Section IIVI describes CHP, a high-performance stabi- 
lizer circuit simulator that implements our tableau algo- 
rithm. We present the results of an experiment designed 
to test how CHP's performance is affected by properties 
of the stabilizer circuit being simulated. CHP has al- 
ready found application in simulations of quantum fault- 
tolerance circuits [17]. 

Section |V] proves that the problem of simulating sta- 
bilizer circuits is complete for the classical complexity 
class ®L. Informally, this means that any stabilizer cir- 
cuit can be simulated using CNOT gates alone; the avail- 
ability of Hadamard and phase gates provides at most a 
polynomial advantage. This result removes some of the 
mystery about the Gottesman-Knill theorem by showing 
that stabilizer circuits are unlikely to be capable even of 
universal classical computation. 



In Section IVII we prove a canonical form theorem that 
we expect will have many applications to the study of 
stabilizer circuits. The theorem says that given any sta- 
bilizer circuit, there exists an equivalent stabilizer cir- 
cuit that applies a round of Hadamard gates, followed by 
a round of phase gates, followed by a round of CNOT 
gates, and so on in the sequence H-C-P-C-P-C-H-P-C- 
P-C (where H, C, P stand for Hadamard, CNOT, Phase 
respectively). One immediate corollary, building on a 
result by Patel, Markov, and Hayes [18| and improving 
one by Dehaene and De Moor [l^, is that any stabilizer 
circuit on n qubits has an equivalent circuit with only 
O (n^/logn) gates. 

Finally, Section IVIII extends our simulation algorithm 
to situations beyond the usual one considered in the 
Gottesman-Knill theorem. For example, we show how to 
handle mixed states, without keeping track of pure states 
from which the mixed states are obtainable by discarding 
qubits. We also show how to simulate circuits involv- 
ing a small number of non-stabilizer gates; or involving 
arbitrary tensor-product initial states, but only a small 
number of measurements. Both of these latter two sim- 
ulations take time that is polynomial in the number of 
qubits, but exponential in the number of non-stabilizer 
gates or measurements. Presumably this exponential 
dependence is necessary, since otherwise we could simu- 
late arbitrary quantum computations in classical subex- 
ponential time. 

We conclude in Section rVIIII with some directions for 
further research. 



II. PRELIMINARIES 

We assume familiarity with quantum computing. This 
section provides a crash course on the stabilizer formal- 
ism, confining attention to those aspects we will need. 
See Section 10.5.1 of Nielsen and Chuang |20] for more 
details. 

Throughout this paper we will use the following four 
Pauli matrices: 



/ = 
Y = 



1 

1 
-i 
i 



X = 



1 

1 
1 
-1 



These matrices satisfy the following identities: 
X^ = Y^ = Z^ =1 



XY = 


iZ YZ = 


iX zx = 


iY 


YX = 


-iZ ZY = 


-iX xz = 


-iY 



In particular, every two Pauli matrices either commute or 
anticommute. The rule for whether to include a minus 
sign is the same as that for quaternions, if we replace 
{I,X,Y,Z)hy{l,i,j,k). 

We define the group Vn of n-qubit Pauli operators to 
consist of all tensor products of n Pauli matrices, together 



with a multiplicative factor of ±1 or ±i (so the total 
number of operators is jT'nl = 4"+^). We omit tensor 
product signs for brevity; thus —YZZI should be read 
—Y®Z®Z®1 (we will use + to represent the Pauli group 
operation). Given two Pauli operators P — i^P\ ■ ■ ■ Pn 
and Q = i'Qi • • • Qn, it is immediate that P commutes 
with Q if and only if the number of indices j G {1, . . . , n} 
such that Pj anticommutes with Qj is even; otherwise P 
anticommutes with Q. Also, for all P G Vn, HP has 
a phase of ±1 then P + P = I ■ ■ ■ I, whereas if P has a 
phase of ±i then P + P = —I ■I. 

Given a pure quantum state \ip), we say a unitary ma- 
trix U stabilizes \Tp) if \rp) is an eigenvector of U with 
eigenvalue 1, or equivalently ii U \Tp) — {ip) where we do 
not ignore global phase. To illustrate, the following ta- 
ble lists the Pauli matrices and their opposites, together 
with the unique 1-qubit states that they stabilize: 



X: |0) + |1> ~X 

Y : |0)+z|l) -Y 
Z : |0) -Z 



|o>-|i> 

|0>-z|l) 

ID 



The identity matrix / stabilizes all states, whereas — / 
stabilizes no states. 

The key idea of the stabilizer formalism is to repre- 
sent a quantum state Jt/i), not by a vector of amplitudes, 
but by a stabilizer group, consisting of unitary matrices 
that stabilize {ip)- Notice that if U and V both stabilize 
IV') then so do UV and U'-^, and thus the set Stab (|'0)) 
of stabilizers of \ip) is a group. Also, it is not hard to 
show that ii\ip) ^\lp) then Stab (IV-)) 7^ Stab (| (/?)). But 
why does this strange representation buy us anything? 
To write down generators for Stab(|'(/')) (even approxi- 
mately) still takes exponentially many bits in general by 
an information-theoretic argument. Indeed stabilizers 
seem worse than amplitude vectors, since they require 
about 2^" parameters to specify instead of about 2"! 

Remarkably, though, a large and interesting class 
of quantum states can be specified uniquely by much 
smaller stabilizer groups — specifically, the intersection of 
Stab dV")) with the Pauh group dH, [ll [ll. This class 
of states, which arises in quantum error correction and 
many other settings, is characterized by the following the- 
orem. 

Theorem 1 Given an n-qubit state lip), the following are 
equivalent: 

(i) IV') can be obtained from |0) " by CNOT, 
Hadamard, and phase gates only. 

(ii) IV') can be obtained from |0) " by CNOT, 
Hadamard, phase, and measurement gates only. 

(Hi) IV') is stabilized by exactly 2" Pauli operators. 

(iv) \ip) is uniquely determined by S {\ip)) — Stab (\ip))r] 
Vn, or the group of Pauli operators that stabilize 

Because of Theorem [1] we call any circuit consisting 
entirely of CNOT, Hadamard, phase, and measurement 



gates a stabilizer circuit, and any state obtainable by 
applying a stabilizer circuit to |0) "a stabilizer state. As 
a warmup to our later results, the following proposition 
counts the number of stabilizer states. 

Proposition 2 Let N be the number of pure stabilizer 
states on n qubits. Then 



N 



2'n 



■^n — k 



2(l/2+o(l))n2 



Proof. We have N = G/A, where G is the total num- 
ber of generating sets and A is the number of equivalent 
generating sets for a given stabilizer S. To find G, note 
that there are 4" — 1 choices for the first generator Mi 
(ignoring overall sign), because it can be anything but 
the identity. The second generator must commute with 
Ml and cannot be / or Mi, so there are 4"'/2 — 2 choices 
for M2. Similarly, M3 must commute with Mi and M2, 
but cannot be in the group generated by them, so there 
are 4"/4 — 4 choices for it, and so on. Hence, including 
overall signs, 

n-l , s n-1 



fe=0 



fe=0 



Similarly, to find A, note that given S, there are 2" — 1 
choices for Mi , 2" — 2 choices for M2 , 2" — 4 choices for 
M3, and so on. Thus 

n — 1 n — 1 

^ = n (2" ~ 2'') "" 2"(""i)/2 j]^ {T-^ - 1) . 



fe=0 



fc=0 



Therefore 



G _,„ VrV4"-'=-l 



^=T=2"n 



A "-•--'- V2"-''' 

fe=0 



n-l 



~ ) = 2" n (2""''' + 1) 



fc=0 



III. 



EFFICIENT SIMULATION OF STABILIZER 
CIRCUITS 



Theorem[l]immediately suggests a way to simulate sta- 
bilizer circuits efficiently on a classical computer. A well- 
known fact from group theory says that any finite group 
G has a generating set of size at most log2 |G|. So if IV') 
is a stabilizer state on n qubits, then the group S'(|V')) 
of Pauli operators that stabilize |V') has a generating set 
of size n = log2 2". Each generator takes 2n -I- 1 bits to 
specify: 2 bits for each of the n Pauli matrices, and 1 bit 
for the phase [40|- So the total number of bits needed 
to specify |V') is n (2n -t- 1). What Gottesman and Knill 
showed, furthermore, is that these bits can be updated 
in polynomial time after a CNOT, Hadamard, phase, or 
measurement gate is applied to iV"). The updates cor- 
responding to unitary gates are very efficient, requiring 
only O (n) time for each gate. 



However, the updates corresponding to measurements 
are not so efficient. We can decide in O (n) time whether 
a measurement of qubit a will yield a deterministic or 
random outcome. If the outcome is random, then up- 
dating the state after the measurement takes O (n^) 
time, but if the outcome is deterministic, then decid- 
ing whether the outcome is |0) or |1) seems to require 
inverting an nx n matrix, which takes O fn^'^^^) time in 
theory [21[ but order n^ time in practice. What that n^ 
complexity means is that simulations of, say, 2000-qubit 
systems would already be prohibitive on a desktop PC, 
given that measurements are frequent. 

This section describes a new simulation algorithm, by 
which both deterministic and random measurements can 
be performed in O (n^) time. The cost is a factor-2 in- 
crease in the number of bits needed to specify a state. 
For in addition to the n stabilizer generators, we now 
store n "destabilizer" generators, which are Pauli opera- 
tors that together with the stabilizer generators generate 
the full Pauli group Vn- So the number of bits needed 
is2n{2n + l) « 4n^ 

The algorithm represents a state by a tableau consist- 
ing of binary variables Xij,Zij for all i € |l,...,2n}, 
j e {1, . . . , n}, and n for aU i g {1, . . . , 2n} [41]: 



\ 





Zll ■ ■ ■ Zin 
Znl ' ' ' Znn 




\ 2;(2n)i • • • a;(2„)„ 


2(n+l)l • • • 2(n+l)n 
Z(2n)l ■ ■ ■ 2(2n)n 


rn+1 
r2n 



J 

Rows 1 to n of the tableau represent the destabilizer gen- 
erators i?i, . . . , Rn, and rows n -I- 1 to 2n represent the 
stabilizer generators i?„+i, . . . , i?2n- If Ri ~ ±-Pi ■ ■ ■ Pn, 
then bits Xij,Zij determine the j*^ Pauli matrix Pj: 00 
means /, 01 means X, 11 means Y, and 10 means Z. 
Finally, r^ is 1 if Ri has negative phase and if r^ has 
positive phase. As an example, the 2-qubit state |00) 
is stabilized by the Pauli operators +ZI and +IZ, so a 
possible tableau for |00) is 



/ 1 





o\ 


1 











1 





V 


1 


0/ 



Indeed, we will take the obvious generalization of the 
above "identity matrix" to be the standard initial 
tableau. 

The algorithm uses a subroutine called rowsum(/i, i), 
which sets generator h equal to i + h. Its purpose is to 
keep track, in particular, of the phase bit r^, including 
all the factors of i that appear when multiplying Pauli 
matrices. The subroutine is implemented as follows. 

ro"wsum(/i, z): Let g {xi, zi,X2, Z2) be a function that 
takes 4 bits as input, and that returns the exponent to 



which i is raised (either 0, 1, or —1) when the Pauli 
matrices represented by xizi and X2Z2 are multiplied. 
More explicitly, if a;i = zi = then g = 0; if xi = 
zi = I then g = Z2 — X2', if xi — 1 and zi — then 
g ~ Z2 (2x2 ^ 1); and if xi =0 and zi = 1 then g = 
X2{l-2z2). Then set r/j := if 

n 

2rh + 2r, + ^g {xij , z^ ,Xhj, Zuj ) = (mod 4) , 

and set r;i := 1 if the sum is congruent to 2 mod 4 (it 
will never be congruent to 1 or 3). Next, for all j G 



{1, . . . ,n}, set Xhj ■■= Xi 
(here and throughout, (E 



© Xhj and set z^j := 2 
denotes exclusive-OR). 



ffi Zhj 



We now give the algorithm. It will be convenient to 
add an additional (2n + lY row for "scratch space." The 
initial state |0) " has r, = for all z e {1, . . . , 2n + 1}, 



and Xi 



5ij and Zy 



'{i-n)j 



for aUie {1, . . . , 2ri -I- 1} 



and j G {1, . . . , n}, where 6ij is 1 if i = j and otherwise. 
The algorithm proceeds through the gates in order; for 
each one it does one of the following depending on the 
gate type. 

CNOT from control a to target h. For all i G 

|1, . . . , 2rj|, set r.j :=: Vi © XiaZn, (a^ib © Zia © Ij, xn, :^ 

Xn, © Xia, and Zia -^ Zia © Zii,. 

Hadamard on qubit a. For all i G {l,...,2n}, set 

Tj := r^ © XiaZia and swap x^a with Z^a■ 

Phase on qubit a. For all z G {1, . . . , 2n}, set r^ :— 

fi © Xia Zia and then set Zia -^ Zia © Xia- 

Measurement of qubit a in standard basis. First 
check whether there exists a. p G {n + 1, . . . , 2n} such 
Lnar Xpa ~~ ^ ■ 

Case I: Such a p exists (if more than one exists, then 
let p be the smallest). In this case the measurement 
outcome is random, so the state needs to be updated. 
This is done as follows. First call rowsum(z,p) for all 
i G {1, . . . , 2n} such that i y^ p and Xia = 1. Second, set 
entire the [p — n) row equal to the p^^ row. Third, set 
the p*'* row to be identically 0, except that Vp is or 1 
with equal probability, and Zpa — 1. Finally, return rp 
as the measurement outcome. 

Case II: Such an p does not exist. In this case the 
outcome is determinate, so measuring the state will not 
change it; the only task is to determine whether or 
1 is observed. This is done as follows. First set 
the (2n-|-l)'' row to be identically 0. Second, call 
rowsum (2n + 1, i + n) for all i G {l,...,n} such that 
Xia — 1- Finally return r2n+i as the measurement out- 
come. 



Once we interpret the x. 



ij , ^ij , 



and ri bits for i > n + 
1 as representing generators of S {\ip)), and rowsum as 
representing the group operation in Vn , the correctness of 
the CNOT, Hadamard, phase, and random measurement 
procedures follows immediately from previous analyses 
by Gottesman 13]. It remains only to explain why the 



determinate measurement procedure is correct. Observe 
that Rfi commutes with Ri if the symplectic inner product 



Rh ■ Ri 



XhlZil 



* •^hn^in u7 ^il^hl 



' •^in^hn 



equals 0, and anticomniutes with Ri if Rh-Ri = 1. Using 
that fact it is not hard to show the following. 

Proposition 3 The following are invariants of the 
tableau algorithm: 

(i) Rn+i, ■ ■ • , R271 generate S Qip)), and Ri,..., i?2n 
generate Vn ■ 

(a) Ri, . . . , Rn commute. 

(Hi) For all h G {1, . . . ,n}, Rh anticommutes with 

Rh+n ■ 

(iv) For all i,h <E {!,..., n} such that i ^ h, Ri com- 
mutes with Rh+n- 

Now suppose that a measurement of qubit a yields a 
determinate outcome. Then the Za operator must com- 
mute with all elements of the stabilizer, so 



/ ^ ChRh+n — ±Za 



for a unique choice of ci, . . . , c„ £ {0, 1}. Our goal is to 
determine the c?i's, since then by summing the appropri- 
ate Rh+nS we can learn whether the phase representing 
the outcome is positive or negative. Notice that for all 
i € {l,...,n}, 



IV. IMPLEMENTATION AND EXPERIMENTS 

We have implemented the tableau algorithm of Sec- 
tion [m] in a C program called CHP (CNOT-Hadamard- 
Phase), which is available for download \^. CHP 
takes as input a program in a simple "quantum assembly 
language," consisting of four instructions: cab (apply 
CNOT from control a to target 6) , h a (apply Hadamard 
to a), p a (apply phase gate to a), and m a (measure a 
in the standard basis, output the result, and update the 
state accordingly). Here a and b are nonnegative inte- 
gers indexing qubits; the maximum a or b that occurs in 
any instruction is assumed to be n — 1, where n is the 
number of qubits. As an example, the following program 
demonstrates the famous quantum teleportation protocol 
of Bennett et al. [lg| : 

hi I EPR pair is prepared (qubit 1 is 
c 1 2 J Ahce's half; qubit 2 is Bob's half) 

Alice interacts qubit (the state to 
be teleported) with her half of the 
EPR pair 

Alice sends 2 classical bits to Bob 



Bob uses the bits from Alice to 
recover the teleported state 



h=l 



Ch {R. ■ R. 



h+n 



= R, 



h=l 



ChRh+n = Ri-Za (mod 2) 



by Proposition [3] Therefore by checking whether Ri 
anticommutes with Za — which it does if and only if 
Xia = 1 — we learn whether c; = 1 and thus whether 
rowsum (2n -I- 1, i -I- n) needs to be called. 

We end this section by explaining how to compute the 
inner product between two stabilizer states |V') and |(^), 
given their full tableaus. The inner product is if the 
stabilizers contain the same Pauli operator with oppo- 
site signs. Otherwise it equals 2~*/^, where s is the 
minimum, over all sets of generators {Gi, . . . ,C?„} for 
Stab (IV')) and {Hi, . . . , if„} for Stab (|i^)), of the num- 
ber of i for which Gi ^ Hi. For example, {XX, ZZ) 
and {ZI,IZ) have inner product l/-\/2, since {ZI,IZ) — 
{ZI, ZZ) . The proof is easy: it suffices to observe that 
neither the inner product nor s is affected if we transform 
\ip) and \ip) to C/ IV') and U \(p) respectively, for some uni- 
tary U such that U \->jj) = |0) " has the trivial stabihzer. 
This same observation yields an algorithm to compute 
the inner product: first transform the tableau of |V') to 
that oi U \^p) — \0) " using Theorem [51 then perform 
Gaussian elimination on the tableau of U \ip) to obtain s. 
Unfortunately, this algorithm takes order n^ steps. 



We also have available CHP programs that demon- 
strate the Bennett- Wiesner dense quantum coding proto- 
col |15|. the GHZ (Greenberger-Horne-Zeilinger) experi- 
ment [1J|, Simon's algorithm [231. and the Shor 9-qubit 
quantum error-correcting code [23] . 

Our main design goal for CHP was high performance 
with a large number of qubits and frequent measure- 
ments. The only reason to use CHP instead of a general- 
purpose quantum computer simulator such as QuIDD [3| 
or QCL [2| is performance, so we wanted to leverage that 
advantage and make thousands of qubits easily simula- 
ble rather than just hundreds. Also, the results of Sec- 
tion |V] suggest that classical postprocessing is unavoid- 
able for stabilizer circuits, since stabilizer circuits are not 
even universal for classical computation. So if we want 
to simulate (for example) Simon's algorithm, then one 
measurement is needed for each bit of the first register. 
CHP's execution time will be dominated by these mea- 
surements, since as discussed in Section [HIl each unitary 
gate takes only O (n) time to simulate. 

Our experimental results, summarized in Figure [H 
show that CHP makes practical the simulation of ar- 
bitrary stabilizer circuits on up to about 3000 qubits. 
Since the number of bits needed to represent n qubits 
grows quadratically in n, the main limitation is available 



memory. On a machine with 256MB of RAM, CHP can 
handle up to about 20000 qubits before virtual memory 
is needed, in which case thrashing makes its performance 
intolerable. The original version of CHP required 'Sn^ 
bits for memory; we were able to reduce this to ~4ri^ 
bits, enabling a 41% increase in the number of qubits 
for a fixed memory size. More trivially, we obtained 
an eightfold improvement in memory by storing 8 bits 
to each byte instead of 1. Not only did that change 
increase the number of storable qubits by 183%, but it 
also made CHP about 50% faster — presumably because 
(1) the rowsum subroutine now needed to exclusive-OR 
only 1/8 as many bytes, and (2) the memory penalty 
was reduced. Storing the bits in 32-bit words yielded 
a further 10% performance gain, presumably because of 
(1) rather than (2) (since even with byte-addressing, a 
whole memory line is loaded into the cache on a cache 
miss). 

As expected, the experimentally measured execution 
time per unitary gate grows linearly in n, whereas the 
time per measurement grows somewhere between linearly 
and quadratically, depending on the states being mea- 
sured. Thus the time needed for measurements gener- 
ally dominates execution time. So the key question is 
this: what properties of a circuit determine whether the 
time per measurement is linear, quadratic, or somewhere 
in between? To investigate this question we performed 
the following experiment. 

We randomly generated stabilizer circuits on n qubits, 
for n ranging from 200 to 3200 in increments of 200. For 
each 71, we used the following distribution over circuits: 
Fix a parameter (3 > 0; then choose \_f3n logj n\ random 
unitary gates: a CNOT from control a to target b, a 
Hadamard on quhit a, or a phase gate on qubit a, each 
with probability 1/3, where a and b are drawn uniformly 
at random from {1, . . . , n} subject to a ^h. Then mea- 
sure qubit a for each a £ {1, . . . , n} m sequence. 

We simulated the resulting circuits in CHP. For each 
circuit, we counted the number of seconds needed for all n 
measurement steps (ignoring the time for unitary gates) , 
then divided by n to obtain the number of seconds per 
measurement. We repeated the whole procedure for /3 
ranging from 0.6 to 1.2 in increments of 0.1. 

There were several reasons for placing measurements 
at the end of a circuit rather than interspersing them with 
unitary gates. First, doing so models how many quan- 
tum algorithms actually work (apply unitary gates, then 
measure, then perform classical postprocessing); second, 
it allowed us to ignore the effect of measurements on sub- 
sequent computation; third, it 'standardized' the mea- 
surement stage, making comparisons between different 
circuits more meaningful; and fourth, it made simulation 
harder by increasing the propensity for the measurements 
to be nontrivially correlated. 

The decision to make the number of unitary gates pro- 
portional to n log n was based on the following heuristic 
argument. The time needed to simulate a measurement 
is determined by how many times the rowsum procedure 



is called, which in turn is determined by how many i's 
there are such that Xia = 1 (where a is the qubit being 
measured). Initially Xia = 1 if and only if a = z, so a 
measurement takes O (n) time. For a random state, by 
contrast, the expected number of i's such that Xia = 1 is 
n by symmetry, so a measurement takes order n^ time. 
In general, the more I's there are in the tableau, the 
longer measurements take. But where does the transi- 
tion from linear to quadratic time occur, and how sharp 
is it? 

Consider n people, each of whom initially knows one se- 
cret (with no two people knowing the same secret). Each 
day, two people chosen uniformly at random meet and ex- 
change all the secrets they know. What is the expected 
number of days until everyone knows everyone else's se- 
crets? Intuitively, the answer is O (nlogn), because any 
given person has to wait Q (n) days between meetings, 
and at each meeting, the number of secrets he knows ap- 
proximately doubles (or towards the end, the number of 
secrets he doesn't know is approximately halved). Re- 
placing people by qubits and meetings by CNOT gates, 
one can see why a 'phase transition' from a sparse to a 
dense tableau might occur after Q (nlogn) random uni- 
tary gates are applied. However, this argument does not 
pin down the proportionality constant /3, so that is what 
we varied in the experiment. 

The results of the experiment are presented in Figure 
[2l When (3 — 0.6, the time per measurement appears 
to grow roughly linearly in n, whereas when /? = 1.2 
(meaning that the number of unitary gates has only dou- 
bled) , the time per measurement appears to grow roughly 
quadratically, so that running the simulations took 4 
hours of computing time 43]. Thus, Figure [5] gives strik- 
ing evidence for a "phase transition" in simulation time, 
as increasing the number of unitary gates by only a con- 
stant factor shifts us from a regime of simple states that 
are easy to measure, to a regime of complicated states 
that are hard to measure. This result demonstrates that 
CHP's performance depends strongly on the circuit be- 
ing simulated. Without knowing what sort of tableaus 
a circuit will produce, all we can say is that the time 
per measurement will be somewhere between linear and 
quadratic in n. 



V. COMPLEXITY OF SIMULATING 
STABILIZER CIRCUITS 

The Gottesman-Knill theorem shows that stabilizer 
circuits are not universal for quantum computation, un- 
less quantum computers can be simulated efficiently by 
classical ones. To a computer scientist, this theorem 
immediately raises a question: where do stabilizer cir- 
cuits sit in the hierarchy of computational complexity 
theory? In this section we resolve that question, by 
proving that the problem of simulating stabilizer circuits 
is complete for a classical complexity class known as ©L 
(pronounced "parity-L") [4J|. The usual definition of 
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FIG. 2: Average time needed to simulate a measurement after 
applying /Snlogjn unitary gates to n qubits, on a 650MHz 
Pentium III with 256MB RAM. 



0L is as the class of all problems that are solvable by a 
nondeterministic logarithmic-space Turing machine, that 
accepts if and only if the total number of accepting paths 
is odd. But there is an alternate definition that is prob- 
ably more intuitive to non-computer-scientists. This is 
that ©L is the class of problems that reduce to simulating 
a polynomial-size CNOT circuit, i.e. a circuit composed 
entirely of NOT and CNOT gates, acting on the initial 
state |0 • • • 0). (It is easy to show that the two definitions 
are equivalent, but this would require us first to explain 
what the usual definition meansl) 

From the second definition, it is clear that ®L C P; in 
other words, any problem reducible to simulating CNOT 
circuits is also solvable in polynomial time on a classical 
computer. But this raises a question: what do we mean 
by "reducible"? Problem A is reducible to problem B 
if any instance of problem A can be transformed into 
an instance of problem B; this means that problem B 
is "harder" than problem A in the sense that the ability 
to answer an arbitrary instance of problem B implies the 
ability to answer an arbitrary instance of problem A (but 
not necessarily vice- versa). 

We must, however, insist that the reduction transform- 
ing instances of problem A into instances of problem B 
not be too difficult to perform. Otherwise, we could re- 
duce hard problems to easy ones by doing all the difficult 
work in the reduction itself. In the case of ©L, we cannot 
mean "reducible in polynomial time," which is a common 
restriction, since then the reduction would be at least as 
powerful as the problem it reduces to! Instead we require 
the reduction to be performed in the complexity class L, 
or logarithmic space — that is, by a Turing machine M 
that is given a read-only input of size n, and a write-only 
output tape, but only O (logn) bits of read/write mem- 



ory. The reduction works as follows: first M specifies a 
CNOT circuit on its output tape; then an "oracle" tells 
M the circuit's output (which we can take to be, say, 
the value of the first qubit after the circuit is applied), 
then M specifies another CNOT circuit on its output 
tape, and so on. A useful result of Hertrampf, Reith, 
and VoUmer [25] says that this seemingly powerful kind 
of reduction, in which M can make multiple calls to the 
CNOT oracle, is actually no more powerful than the kind 
with only one oracle call. (In complexity language, what 
(25| showed is that ©L = L®"-: any problem in L with ©L 
oracle is also in ©L itself.) 

It is conjectured that L ^ ©L; in other words, that an 
oracle for simulating CNOT circuits would let an L ma- 
chine compute more functions than it could otherwise. 
Intuitively, this is because writing down the intermediate 
states of such a circuit requires more than a logarithmic 
number of read/write bits. Indeed, ©L contains some 
surprisingly "hard" problems, such as inverting matrices 
over GF2 [2J]. On the other hand, it is also conjectured 
that ©L ^ P, meaning that even with an oracle for sim- 
ulating CNOT circuits, an L machine could not simulate 
more general circuits with AND and OR gates. As usual 
in complexity theory, neither conjecture has been proved. 

Now define the Gottesman-Knill problem as fol- 
lows. We are given a stabilizer circuit C as a sequence 
of gates of the form CNOT a ^ b, Hadamard a, Phase 
a, or Measure a, where a,b S {l,...,n} are indices of 
qubits. The problem is to decide whether qubit 1 will 
be |1) with certainty after C is applied to the initial state 
|0) ". (If not, then qubit 1 will be |1) with probability 
either 1/2 or 0.) 

Since stabilizer circuits are a generalization of CNOT 
circuits, it is obvious that Gottesman-Knill is ©L- 
hard (i.e. any ©L problem can be reduced to it). Our re- 
sult says that Gottesman-Knill is in ©L. Intuitively, 
this means that any stabilizer circuit can be simulated 
efficiently using CNOT gates alone — the additional avail- 
ability of Hadamard and phase gates gives stabilizer cir- 
cuits at most a polynomial advantage. In our view, this 
surprising fact helps to explain the Gottesman-Knill the- 
orem, by providing strong evidence that stabilizer circuits 
are not even universal for classical computation (assum- 
ing, of course, that classical postprocessing is forbidden). 

Theorem 4 Gottesman-Knill is in ©L. 

Proof. We will show how to solve Gottesman-Knill 
using a logarithmic-space machine M with an oracle for 
simulating CNOT circuits. By the result of Hertrampf, 
Reith, and VoUmer [25| described above, this will suffice 
to prove the theorem. 

By the principle of deferred measurement, we can as- 
sume that the stabilizer circuit C has only a single mea- 
surement gate at the end (say of qubit 1), with all other 
measurements replaced by CNOT's into ancilla qubits. 



In the tableau algorithm of Section IIIIl let x 
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be the values of the variables Xi 



after t gates of C 



have been applied. Then M will simulate C by comput- 
ing these values. The first task of M is to decide whether 
the measurement has a determinate outcome — or equiv- 
alently, whether xl^ =0 for every i € {n + 1, . . . , 2n\, 
where T is the number of unitary gates. Observe that 
in the CNOT, Hadamard, and phase procedures, every 
update to an Xij or z^j variable replaces it by the sum 
modulo 2 of one or two other Xij or Zij variables. Also, 
iterating over all t E {0, . . . ,T} and i G {!,..., 2n} takes 
only O (log n) bits of memory. Therefore, despite its 
memory restriction, M can easily write on its output 
tape a description of a CNOT circuit that simulates the 
tableau algorithm using An^ bits (the r^'s being omitted), 

and that returns x)^^ for any desired i. Then to decide 
whether the measurement outcome is determinate, Af 
simply iterates over all i from n + 1 to 2n. 

The hard part is to decide whether |0) or |1) is mea- 
sured in case the measurement outcome is determinate, 
for this problem involves the r^ variables, which do not 
evolve in a linear way as the x^'s and z^'s do. Even 
worse, it involves the complicated-looking and nonlinear 
rowsum procedure. Fortunately, though, it turns out 

that the measurement outcome ?'2n+i *^^^ tie computed 
by keeping track of a single complex number a. This a 
is a product of phases of the form ±1 or ±i, and there- 
fore takes only 2 bits to specify. Furthermore, although 
the "obvious" ways to compute a use more than O (log n) 
bits of memory, M can get around that by making liberal 
use of the oracle. 

First Af computes what r2„^i would be if the CNOT, 
Hadamard, and phase procedures did not modify the r^'s. 
Let P be a Pauli matrix with a phase of ±1 or ±i, which 
therefore takes 4 bits to specify. Also, let P^- ' be the 

(T) (T) 

Pauli matrix represented by the bits xl, \z\- in the 
usual way: / = 00, X = 10, F = 11, Z = 01. Then 
the procedure is as follows. 

a := 1 

for i := 1 to n 

P =1 

for i := n -|- 1 to 2n 
ask oracle for 

if x^P ,, = 1 then P := p'>P P 

{i—n)l ij 

next i 

multiply a by the phase of P (±1 or ±i) 
next j 

The "answer" is 1 if a — —1 and if a = 1 (note that 
a will never be ±i at the end). However, M also needs 
to account for the r^'s, as follows. 



AT) JT) (T) 



for i 



1 to 2n 



ask oracle for x 



if X 



(T) 



(T) 
(i-n)l 



1 



(i-n)l 

for t := to T - 1 

if (t -f- 1)'' gate is a Hadamard or phase on a 

ask oracle for x\^ , z\J 



''^^xfl^fa = 1 then a:^-a 
end if 
if (t + 1)''* gate is a CNOT from a to 6 



ask oracle for x)J , z]J , x^ , z^ 



•f (t) 



(«) ( 



(*) ^ J*) 



1 



1 then a 



end if 
next t 
end if 
next i 

The measurement outcome, r2n+i i is then 1 if a = —1 
and if a = 1. As described above, the machine M 
needs only O (log n) bits to keep track of the loop indices 
i,j,i, and 0(1) additional bits to keep track of other 
variables. Its correctness follows straightforwardly from 
the correctness of the tableau algorithm. ■ 

For a problem to be ©L-complete simply means that 
it is ®L-hard and in ©L. Thus, a corollary of Theorem 
|4]is that Gottesman-Knill is ©L-complete. 



VI. CANONICAL FORM 

Having studied the simulation of stabilizer circuits, in 
this section we turn our attention to manipulating those 
circuits. This task is of direct relevance to quantum 
computer architecture: because the effects of decoher- 
ence build up over time, it is imperative (even more so 
than for classical circuits) to minimize the number of 
gates as well as wires and other resources. Even if fault- 
tolerant techniques will eventually be used to tame de- 
coherence, there remains the bootstrapping problem of 
building the fault-tolerance hardware! In that regard we 
should point out that fault-tolerance hardware is likely 
to consist mainly of CNOT, Hadamard, and phase gates, 
since the known fault-tolerant constructions (for exam- 
ple, that of Aharonov and Ben- Or [2^) are based on sta- 
bilizer codes. 

Although there has been some previous work on syn- 
thesizing CNOT circuits [ll, |23, [23| and general classical 
reversible circuits ^29,[3^, to our knowledge there has not 
been work on synthesizing stabilizer circuits. In this sec- 
tion we prove a canonical form theorem that is extremely 
useful for stabilizer circuit synthesis. The theorem says 
that given any circuit consisting of CNOT, Hadamard, 
and phase gates, there exists an equivalent circuit that 
applies a round of Hadamard gates only, then a round 
of CNOT gates only, and so on in the sequence H-C-P- 
C-P-C-H-P-C-P-C. One easy corollary of the theorem is 
that any tableau satisfying the commutativity conditions 
of Proposition [3] can be generated by some stabilizer cir- 
cuit. Another corollary is that any unitary stabilizer 
circuit has an equivalent circuit with only O (n^/logrt) 
gates. 

Given two n-qubit unitary stabilizer circuits Ci,C2, we 
say that Ci and C2 are equivalent if Ci (ji/')) = C2 (jf/')) for 
all stabilizer states j-i/'), where Ci (|i/')) is the final state 
when Ci is applied to |V') [4^. By linearity, it is easy to 



see that equivalent stabilizer circuits will behave identi- 
cally on all states, not just stabilizer states. Further- 
more, there exists a one-to-one correspondence between 
circuits and tableaus: 

Lemma 5 Let Ci,C2 be unitary stabilizer circuits, and 
let Ti,T2 be their respective final tableaus when we run 
them on the standard initial tableau. Then Ci and C2 
are equivalent if and only if Ti =Ti- 

Proof. Clearly Ti = 72 if C\ and C2 are equivalent. 
For the other direction, it suffices to observe that a uni- 
tary stabilizer circuit acts linearly on Pauli operators 
(that is, rows of the tableau): if it maps P\ to Q\ and 
P2 to (52, then it maps Pi + P2 to Qi + Q2- Since the 
rows of the standard initial tableau form a basis for Vn, 
the lemma follows. ■ 

Our proof of the canonical form theorem will use the 
following two lemmas. 

Lemma 6 Given an n-qubit stabilizer state, it is always 
possible to apply Hadamard gates to a subset of the qubits 
so as to make the X matrix have full rank (or equiva- 
lently, make all 2" basis states have nonzero amplitude) . 

Proof. We can always perform row additions on the 
n X 2n stabilizer matrix without changing the state that 
it represents. Suppose the X matrix has rank k < n; 
then by Gaussian elimination, we can put the stabilizer 
matrix in the form 



A 




B 

C 



where A is kxn and has rank k. Then since the rows are 
linearly independent, C must have rank n — k; therefore 
it has an (n — k) x [n ~ k) submatrix C2 of full rank. 
Let us permute the columns of the X and Z matrices 
simultaneously to obtain 





and then perform Gaussian elimination on the bottom 
n — k rows to obtain 





Now commutativity relations imply 

Ai A2 




and therefore AiD^ = A2. Notice that this implies 
that the k x k matrix Ai has full rank, since otherwise 
the X matrix would have column rank less than k. So 




performing Hadamards on the rightmost n 
yields a state 



Bi A2 
D 



whose X matrix has full rank. ■ 

Lemma 7 For any symmetric matrix A G 
exists a diagonal matrix A such that A+A = 
M some invertible binary matrix. 



k qubits 



Z^^", there 



with 



Proof. We will let M be a lower-triangular matrix 
with Is all along the diagonal: 



M^j =0 i <j 



(1) 
(2) 



Such an M is always invertible. Then there exists a 
diagonal A such that A + A = MM^ if and only if 



A,j = ^ M,kM. 



jk 



(3) 



for all pairs («, j) with i > j. (We pick A appropriately 
to satisfy the equations for An automatically, and both 
sides of the equation are symmetric, covering the cases 
with i < j.) 

We will perform induction on i and j to solve for the 
undetermined elements of M . For the base case, we know 



that Mil 



1. We will determine Mij for i > j by 



supposing we have already determined Mi/ji for either 
*' < i, j' < j or i' < i, j' < j. We consider equation ^ 
for Aij and note that MikMjk = unless k < j. Then 



k<j 



(4) 



By the induction hypothesis, we have already determined 
in the sum both Mik (since k < j) and Mjk (since j < i 
and k < j), so this equation uniquely determines Mij. 
We can thus find a unique M that satisfies ^ for all 
i> j. m 

Say a unitary stabilizer circuit is in canonical form if 
it consists of 11 rounds in the sequence H-C-P-C-P-C-H- 
P-C-P-C. 

Theorem 8 Any unitary stabilizer circuit has an equiv- 
alent circuit in canonical form. 

Proof. Divide a 2n x 2n tableau into four n x n ma- 
trices A = (dij), B = {hij), C = {cij), and D — (dij), 
containing the destabilizer Xij bits, destabilizer z^ bits, 
stabilizer x^ bits. 



and stabilizer Zij bits respectively: 



A 



C 



B 



D 



(We can ignore the phase bits r^.) Since unitary circuits 
are reversible, by Lemma [S] it suffices to show how to 
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obtain the standard initial tableau starting from an ar- 
bitrary A, B, C, D by applying CNOT, Hadamard, and 
phase gates [4^. We cannot use row additions, since al- 
though they leave states invariant they do not in general 
leave circuits invariant. 

The procedure is as follows. 

(1) Use Hadamards to make C have full rank (this is 
possible by Lemma [51). 

(2) Use CNOT's to perform Gaussian elimination on C, 
producing 



B 



D 



(10) Use phases to produce 



N 







C 



then by commutativity relations, NC^ — I . Next apply 
two phases each to some subset of qubits in order to 
preserve the above tableau, but set ri = • • • = r„ = 0. 
(11) Use CNOT's to produce 



/ 







(3) Commutativity of the stabilizer implies that ID^ is 
symmetric, therefore D is symmetric, and we can ap- 
ply phase gates to add a diagonal matrix to D and use 
Lemma [7] to convert D to the form D ~ MM^ for some 
invertible M. 

(4) Use CNOT's to produce 



A 



M 



B 



M 



Note that when we map / to IM, we also map D to 

D {M^y' = MM^ (M^y^ = M. 

(5) Apply phases to all n qubits to obtain 



A 



M 



B 







Since M is full rank, there exists some subset S of qubits 

such that applying two phases in succession to every a G 

S will preserve the above tableau, but set r„+i = • • • = 

f2n — 0. Apply two phases to every a € S. 

(6) Use CNOT's to perform Gaussian elimination on M, 

producing 



A 



I 



B 







By commutativity relations, IB^ 

B^I. 

(7) Use Hadamards to produce 



AO^ 



/, therefore 



/ 



A 







(8) Now commutativity of the destabilizer implies that 
A is symmetric, therefore we can again use phase gates 
and Lemma [7] to make A = NN"^ for some invertible N . 

(9) Use CNOT's to produce 



N 



N 



C 



Since Theorem [8] relied only on a tableau satisfying the 
commutativity conditions, not on its being generated by 
some stabilizer circuit, an immediate corollary is that any 
tableau satisfying the conditions is generated by some 
stabilizer circuit. We can also use Theorem [8] to answer 
the following question: how many gates are needed for 
an n-qubit stabilizer circuit in the worst case? Cleve 
and Gottesman [31] showed that O {it?) gates suffice for 
the special case of state preparation, and Gottesman |32l | 
and Dehaene and De Moor [i9| showed that O (n^) gates 
suffice for stabilizer circuits more generally; even these 
results were not obvious a priori. However, with the help 
of our canonical form theorem we can show a stronger 
upper bound. 

Corollary 9 Any unitary stabilizer circuit has an equiv- 
alent circuit with only O (n^/logn) gates. 

Proof. Patel, Markov, and Hayes [18] showed that 
any CNOT circuit has an equivalent CNOT circuit with 
only O (n^/logn) gates. So given a stabilizer circuit C, 
ffist put C into canonical form, then minimize the CNOT 
segments. Clearly the Hadamard and Phase segments 
require only O {n) gates each. ■ 

Corollary [9] is easily seen to be optimal by a Shannon 
counting argument: there are 2^" ' distinct stabilizer 
circuits on n qubits, but at most (n^) with T gates. 

A final remark: as noted by Moore and Nilsson [28|, 
any CNOT circuit has an equivalent CNOT circuit with 
O (n^) gates and parallel depth O(logn). Thus, using 
the same idea as in Corollary [51 we obtain that any uni- 
tary stabilizer circuit has an equivalent stabilizer circuit 
with O (n^) gates and parallel depth O (logn). (Moore 
and Nilsson showed this for the special case of stabilizer 
circuits composed of CNOT and Hadamard gates only.) 



VII. BEYOND STABILIZER CIRCUITS 

In this section, we discuss generalizations of stabilizer 
circuits that are still efficiently simulable. The first 
(easy) generalization, in Section IVII Al is to allow the 
quantum computer to be in a mixed rather than a pure 
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state. Mixed states could be simulated by simply pu- 
rifying the state, and then simulating the purification, 
but we present an alternative and slightly more efficient 
strategy. 

The second generalization, in Section FVlI Bl is to initial 
states other than the computational basis state. Taken 
to an extreme, one could even have noncomputable initial 
states. When combined with arbitrary quantum circuits, 
such quantum advice is very powerful, although its ex- 
act power (relative to classical advice) is unknown [33| . 
We consider a more modest situation, in which the initial 
state may include specific ancilla states, consisting of at 
most b qubits each. The initial state is therefore a tensor 
product of blocks of b qubits. Given an initial state of 
this form and general stabilizer circuits, including mea- 
surements and classical feedback based on measurement 
outcomes , universal quantum computation is again possi- 
ble [SJ, [Sal- However, we show that an efficient classical 
simulation exists, provided only a few measurements are 
allowed. 

The final generalization, in Section FVlI Ci is to circuits 
containing a few non-stabilizer gates. The qualifier "few" 
is essential here, since it is known that unitary stabilizer 
circuits plus any additional gate yields a universal set 
of quantum gates [30, [33|- The running time of our 
simulation procedure is polynomial in n, the number of 
qubits, but is exponential in the d, the number of non- 
stabilizer gates. 



A. Mixed States 

We first present the simulation for mixed states. We 
allow only stabilizer mixed states — that is, states that 
are uniform distributions over all states in a subspace (or 
equivalently, all stabilizer states in the subspace) with a 
given stabilizer of r < n generators. Such mixed states 
can always be written as the partial trace of a pure sta- 
bilizer state, which immediately provides one way of sim- 
ulating them. 

It will be useful to see how to write the density ma- 
trix of the mixed state in terms of the stabilizer. The 
operator (/ -|- M) /2, when M is a Pauli operator, is a 
projection onto the -1-1 eigenspace of M. Therefore, if 
the stabilizer of a pure state has generators Mi, . . . , M„, 
then the density matrix for that state is 



1 " 



i=l 



The density matrix for a stabilizer mixed state with sta- 
bilizer generated by Mi , . . . , AI^- is 



= ^f[iI + Mi. 



To perform our simulation, we find a collection of 
2 (n ~ r) operators Xi and Zi that commute with both 



the stabilizer and the destabilizer. We can choose them 
so tha^[X„Xj] = [Z^,Z,] = [X„Z,] = for z ^ j, 
but {^Xi,Zi] = 0. This can be done by solving a set 
of linear equations, which in practice takes time O [n^]- 
If we start with an initial mixed state, we will assume 
it is of the form |00 • • • 0) (00 • • • 0| (g) / (so on the first 
n — r qubits and the completely mixed state on the last 
r qubits). In that case, we choose Xi = Xi^r and 

Zi = Zi^r- 

We could purify this state by adding ZiZn^i and 
XiXn+i to the stabilizer and Xn+i and Zn+i to the desta- 
bilizer for i — 1 , . . . , r. Then we could simulate the sys- 
tem by just simulating the evolution of this pure state 
through the circuit; the extra r qubits are never altered. 

A more economical simulation is possible, however, by 
just keeping track of the original r-generator stabilizer 
and destabilizer, plus the 2 (n — r) operators Xi and Zi. 
Formally, this allows us to maintain a complete tableau 
and generalize the O (n?) tableau algorithm from Sec- 
tion mil We place the r generators of the stabilizer as 
rows 71-1-1,.. .,71 + ?" of the tableau, and the correspond- 
ing elements of the destabilizer as rows 1, . . . ,r. The 
new operators Xi and Zi {i = 1 , . . . , n — r) become rows 
r + i and n + r + i, respectively. Let i = i + niii<n 
and i = i^n[ii>n + l. Then we have that rows Ri 
and Rj commute unless i = j, in which case Ri and Rj 
anticommute. 

We can keep track of this new kind of tableau in much 
the same way as the old kind. Unitary operations trans- 
form the new rows the same way as rows of the stabilizer 
or destabilizer. For example, to perform a CNOT from 
control qubit a to target qubit b, set xn, := xn, ® Xia and 
Zia := Zia ® Zib, for aU i e {1, . . . , 2n}. 

Measurement of qubit a is slightly more complex than 
before. There are now three cases: 

Case I: Xpa = 1 for some p g {n -I- 1, . . . , n -I- r}. In this 
case Za anticommutes with an element of the stabilizer, 
and the measurement outcome is random. We update 
as before, for all rows of the tableau. 
Case II: Xpa = for all p > r. In this case Za is in the 
stabilizer. The measurement outcome is determinate, 
and we can predict the result as before, by calling rowsum 
to add up rows r„-|_i for those i with Xia — 1. 
Case III: Xpa = for all p G {n + I, . . . ,n + r}, 
but Xma = 1 for some m G {r + 1, . . . ,n} or m G 
{n + r + 1, . . . , 27i}. In this case Za commutes with 
all elements of the stabilizer but is not itself in the sta- 
bilizer. We get a random measurement result, but a 
slightly different transformation of the stabilizer than in 
Case I. Observe that row Rm anticommutes with Za- 
This row takes the role of row p from Case I, and the 
row Rjn takes the role of row p — n. Update as before 
with this modification. Then swap rows n + r + I and 
m and rows r + 1 and m. Finally, increase r to r + 1: 
the stabilizer has gained a new generator. 

Another operation that we might want to apply is dis- 
carding the qubit a, which has the effect of performing 
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a partial trace over that qubit in the density matrix. 
Again, this can be done by simply keeping the qubit 
in our simulation and not using it in future operations. 
Here is an alternative: put the stabilizer in a form such 
that there is at most one generator with an X on qubit 
a, and at most one with a Z on qubit a. Then drop 
those two generators (or one, if there is only one total). 
The remaining generators describe the stabilizer of the 
reduced mixed state. We also must put the Xi and Zi 
operators in a form where they have no entries in the 
discarded location, while preserving the structure of the 
tableau (namely, the commutation relations of Proposi- 
tion [3]). This can also be done in time 0{ti?), but we 
omit the details, as they are rather involved. 



B. Non-Stabilizer Initial States 

We now show how to simulate a stabilizer circuit where 
the initial state is more general, involving non-stabilizer 
initial states. We allow any number of ancillas in arbi- 
trary states, but the overall ancilla state must be a tensor 
product of blocks of at most b qubits each. An arbitrary 
stabilizer circuit is then applied to this state. We allow 
measurements, but only d of them in total throughout 
the computation. We do allow classical operations con- 
ditioned on the outcomes of measurements, so we also 
allow polynomial-time classical computation during the 
circuit. 

Let the initial state have density matrix p: a tensor 
product of m blocks of at most h qubits each. Without 
loss of generality, we first apply the unitary stabilizer 
circuit Ui, followed by the measurement Zi (that is, a 
measurement of the first qubit in the standard basis). 
We then apply the stabilizer circuit U2 , followed by mea- 
surement Z2 on the second qubit, and so on up to Ud, Zd- 

We can calculate the probability p (0) of obtaining out- 
come for the first measurement Zi as follows: 



p(0)=Tr iI + Zi)UipUl /2 



I + UlZiUi)p\/2 



= Tr 



= l/2 + Tr^(UlZiUijp^ /2. 

But Ui is a stabilizer operation, so U{ZiUi is a Pauli 
matrix, and is therefore a tensor product operation. We 
also know p is a tensor product of blocks of at most b 
qubits, and the trace of a tensor product is the product 
of the traces. Let p = '^JLiPj and UlZiUi - 
where j ranges over the blocks. Then 



m p 



^ m 



Since Pj and pj are both 2^ x 2''-dimensional matrices, 
each Tr (PjPj) can be computed in time O (2^**). 



By flipping an appropriately biased coin, Alice can gen- 
erate an outcome of the first measurement according to 
the correct probabilities. Conditioned on this outcome 
(say of 0), the state of the system is 

{I + Z^)U,pUl{l + Z,) 
4p(0) 

After the next stabilizer circuit U2, the state is 

U2iI + Zi)UipUl{l + Zi)ul 
4p(0) 

The probability of obtaining outcome for the second 
measurement, conditioned on the outcome of the first 
measurement being 0, is then 



p(0|0) 



Tr 



(/ + Z2) U2 {I + Zi) UipUl {I + Zi) ul 



'(0) 



By expanding out the 8 terms, and then commuting Ui 
and U2 past Zi and Z2, we can write this as 



8 m 



En^Kv.) 



Each Tr I P^ pij 1 term can again be computed in time 

0(22^). 

Similarly, the probability of any particular sequence of 
measurement outcomes mim2 ■ ■ ■ rud can be written as a 
sum 



p {mim,2 ■ ■ ■ nid) = E II "^^ {^^j P^j) ' 

where each trace can be computed in time O (2^'') . It 
follows that the probabilities of the two outcomes of the 
d*^ measurement can be computed in time O (7712^^+^'') . 



C. Non-Stabilizer Gates 

The last case that we consider is that of a circuit con- 
taining d non-stabilizer gates, each of which acts on at 
most b qubits. We allow an unlimited number of Pauli 
measurements and unitary stabilizer gates, but the initial 
state is required to be a stabilizer state — for concreteness, 
|0)®". 

To analyze this case, we examine the density matrix 
pt at the t^^ step of the computation. Initially, po is 
a stabilizer state whose stabilizer is generated by some 
Ml , . . . , M„ , so we can write it as 



P 



1 

2" 



(/ + Afi)(/-fM2)---(/ + Af„) 



If we perform a stabilizer operation, the Af^'s become a 
different set of Pauli operators, but keeping track of them 
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requires at most n (2n + 1) bits at any given time (or 
2n {2n + 1) if we include the destabilizer). If we perform 
a measurement, the M^'s change in a more comphcated 
way, but remain Pauh group elements. 

Now consider a single non-stabilizer gate U. Expand- 
ing U in terms of Pauli operations Pi , 



l-Y^c,clP.,Pu\{il+{-l 



xMj-Pk 



Ah 



Here Mj ■ Pk is the symplectic inner product between the 
corresponding vectors, which is whenever Mj and P^ 
commute and 1 when they anticommute. In what fol- 
lows, let Cik — CiC*!. and Pik = PtPk- Then we can write 
the density matrix after [/ as a sum of terms, each de- 
scribed by a Pauli matrix Pik and a vector of eigenvalues 
for the stabilizer. Since U and W each act on at most b 
qubits, there are at most 4^^ terms in this sum. 

If we apply a stabilizer gate to this state, all of the 
Pauli matrices in the decomposition are transformed to 
other Pauli matrices, according to the usual rules. If we 
perform another non-stabilizer gate, we can again expand 
it in terms of Pauli matrices, and put it in the same form. 
The new gate can act on b new qubits, however, giving 
us more terms in the sum. After d such operations, we 
thus need to keep track of at most 4^*"* complex numbers 
(the coefficients Cik), 4'"' strings each of 2n bits (the Pauli 
matrices Pik), and 4'"^ strings each of n bits (the inner 
products Mj ■ Pk). We also need to keep track of the 
stabilizer generators Mi, . . . ,M„, and it will be helpful 
to also keep track of the destabilizer, for a total of an 
additional 2n (2n -I- 1) bits. 

The above allows us to describe the evolution when 
there are no measurements. What happens when we 
perform a measurement? Consider the unnormalized 
density matrix corresponding to outcome for measure- 
ment of the Pauli operator Q: 



P(0) 



1 



:Q+ J2 ^^^P^^ n (^ + (-1)'"^-'^^^^.) Q' 



i,k 



where here and throughout we let Q+ = / + Q and Q^ = 
I — Q. As usual, either Q commutes with everything 
in the stabilizer, or Q anticommutes with some element 
of the stabilizer. (However, the measurement outcome 
can be indeterminate in both cases, and may have a non- 
uniform distribution.) In the first case, we can rewrite 
the density matrix as 



P{0) 



2n+2 Z^* 
i.k 



■.kQ + P^kQ+Y[( I +i-l 



■.MyPk 



M, 



But Q^PikQ^ = 2PikQ^ if Pik and Q commute, and 
Q^PikQ^ = Q^QPik = if Pik and Q anticommute. 
Furthermore, as usual, as Q commutes with everything in 



the stabilizer, Q is actually in the stabilizer, so projecting 
on Q^ either is redundant (if Q has eigenvalue -1-1) or 
annihilates the state (if Q has eigenvalue — 1 ) . Therefore, 
we can see that p (0) has the same form as before: 



Pi^)^^.T.^^kP^u\{iI+{-l 



\M,-P,. 



AL 



i,k 



where now the sum over i is only over those P^/c's that 
commute with Q, and the sum over k is only over those 
Pfc's that give eigenvalue -1-1 for Q. 

When Q anticommutes with an element of the stabi- 
lizer, we can change our choice of generators so that Q 
commutes with all of the generators except for AIi . Then 
we write p (0) as: 



P(0) 



1 



2"+2 



2"+2 



J2 c^fcQ+^.fc {l + i-l)^''-^' Afi) Q+Ak 

i.k 



^ CikQ+Pik 



i.k 



Q+ + {-l)^'^-''''Q-Ah 



A, 



where 



Ak 



Yl (l+{-l)^'^''^''Mj 



If Pik and Q commute, then we keep only the first term 
(5+ in the square brackets. If Pik and Q anticommute, we 
keep only the second term Q~Mi in the square brackets. 
In either case, we can rewrite the density matrix in the 
same kind of decomposition: 



i,k j>l 



.MyPk 



' M, 



where Q has replaced Mi in the stabilizer, and any Pik 
that anticommutes with Q has been replaced by Pik Mi, 
its corresponding Cik replaced by (—1) ^ '' Cik- 

Therefore, we can always write the density matrix after 
the measurement in the same kind of sum decomposition 
as before, with no more terms than there were before the 
measurement. The density matrices are unnormalized, 
so we need to calculate Tr p (0) to determine the proba- 
bility of obtaining outcome 0. Computing the trace of 
a single term is straightforward: it is if Pik is not in 
the stabilizer and ±2"cife if Pik is in the stabilizer (with 
+ or — determined by the eigenvalue of Pik)- To cal- 
culate Trp(O), we just need to sum the traces of the 
^2bd individual terms. We then choose a random number 
to determine the actual outcome. Thereafter, we only 
need to keep track of p (0) or p (1), which we can easily 
renormalize to have unit trace. Overall, this simulation 



26d, 



therefore takes time and space O (4 



VIII. OPEN PROBLEMS 

(1) Iwama, Kambayashi, and Yamashita !27| gave a set 
of local transformation rules by which any CNOT cir- 
cuit (that is, a circuit consisting solely of CNOT gates) 
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can be transformed into any equivalent CNOT circuit. 
For example, a CNOT from a to 6 followed by another 
CNOT from a to 6 can be replaced by the identity, and a 
CNOT from a to 6 followed by a CNOT from c to d can 
be replaced by a CNOT from c to d followed by a CNOT 
from a to 6, provided that a ^ d and h ^ c. Using 
Theorem |H1 can we similarly give a set of local transfor- 
mation rules by which any unitary stabilizer circuit can 
be transformed into any equivalent unitary stabilizer cir- 
cuit? Such a rule set could form the basis of an efficient 
heuristic algorithm for minimizing stabilizer circuits. 

(2) Can the tableau algorithm be modified to compute 
measurement outcomes in only O (n) time? (In case 
the measurement yields a random outcome, updating the 
state might still take order n^ time.) 

(3) In Theorem [HI is the 11-round sequence H-C-P-C- 
P-C-H-P-C-P-C really necessary, or is there a canonical 
form that uses fewer rounds? Note that if we are only 
concerned with state preparation, and not with how a cir- 
cuit behaves on any initial state other than the standard 
one, then the 5-round sequence H-P-C-P-H is sufficient. 

(4) Is there a set of ciuantum gates that is neither univer- 
sal for quantum computation, nor classically simulable in 
polynomial time? Shi [38| has shown that if we gener- 
alize stabilizer circuits by adding any 1- or 2-qubit gate 
not generated by CNOT, Hadamard, and phase, then we 
immediately obtain a universal set. 

(5) What is the computational power of stabilizer circuits 
with arbitrary tensor product initial states, but measure- 
ments delayed until the end of the computation? It is 
known that, if we allow classical postprocessing and con- 



trol of future quantum operations conditioned on mea- 
surement results, then universal quantum computation 
is possible [3l,[3^. However, if all measurements are de- 
layed until the end of the computation, then the quantum 
part of such a circuit (though not the classical postpro- 
cessing) can be compressed to constant depth. On the 
other hand, Terhal and DiVincenzo 39] have given evi- 
dence that even constant-depth quantum circuits might 
be difficult to simulate classically. 

(6) Is there an efficient algorithm that, given a CNOT 
or stabilizer circuit, produces an equivalent circuit of 
(approximately) minimum size? Would the existence 
of such an algorithm have unlikely complexity conse- 
quences? This might be related to the hard problem 
of proving superlinear lower bounds on CNOT or stabi- 
lizer circuit size for explicit functions. 
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